query performance prediction
Uncovering the Limitations of Query Performance Prediction: Failures, Insights, and Implications for Selective Query Processing
Chifu, Adrian-Gabriel, Déjean, Sébastien, Mothe, Josiane, Garouani, Moncef, Ortiz, Diego, Ullah, Md Zia
Query Performance Prediction (QPP) estimates retrieval systems effectiveness for a given query, offering valuable insights for search effectiveness and query processing. Despite extensive research, QPPs face critical challenges in generalizing across diverse retrieval paradigms and collections. This paper provides a comprehensive evaluation of state-of-the-art QPPs (e.g. NQC, UQC), LETOR-based features, and newly explored dense-based predictors. Using diverse sparse rankers (BM25, DFree without and with query expansion) and hybrid or dense (SPLADE and ColBert) rankers and diverse test collections ROBUST, GOV2, WT10G, and MS MARCO; we investigate the relationships between predicted and actual performance, with a focus on generalization and robustness. Results show significant variability in predictors accuracy, with collections as the main factor and rankers next. Some sparse predictors perform somehow on some collections (TREC ROBUST and GOV2) but do not generalise to other collections (WT10G and MS-MARCO). While some predictors show promise in specific scenarios, their overall limitations constrain their utility for applications. We show that QPP-driven selective query processing offers only marginal gains, emphasizing the need for improved predictors that generalize across collections, align with dense retrieval architectures and are useful for downstream applications.
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
- North America > United States (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
MERLIN: Multi-stagE query performance prediction for dynamic paRallel oLap pIpeliNe
Zhang, Kaixin, Wang, Hongzhi, Gu, Kunkai, Li, Ziqi, Zhao, Chunyu, Li, Yingze, Yan, Yu
High-performance OLAP database technology has emerged with the growing demand for massive data analysis. To achieve much higher performance, many DBMSs adopt sophisticated designs including SIMD operators, parallel execution, and dynamic pipeline modification. However, such advanced OLAP query execution mechanisms still lack targeted Query Performance Prediction (QPP) methods because most existing methods target conventional tree-shaped query plans and static serial executors. To address this problem, in this paper, we proposed MERLIN a multi-stage query performance prediction method for high-performance OLAP DBMSs. MERLIN first establishes resource cost models for each physical operator. Then, it constructs a DAG that consists of a data-flow tree backbone and resource competition relationships among concurrent operators. After using a GAT with an extra attention mechanism to calibrate the cost, the cost vector tree is extracted and summarized by a TCN, ultimately enabling effective query performance prediction. Experimental results demonstrate that MERLIN yields higher performance prediction precision than existing methods.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Hawaii (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (7 more...)
- Information Technology > Databases (1.00)
- Information Technology > Data Science > Data Quality (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Identifying High Consideration E-Commerce Search Queries
Chen, Zhiyu, Choi, Jason, Fetahu, Besnik, Malmasi, Shervin
In e-commerce, high consideration search missions typically require careful and elaborate decision making, and involve a substantial research investment from customers. We consider the task of identifying High Consideration (HC) queries. Identifying such queries enables e-commerce sites to better serve user needs using targeted experiences such as curated QA widgets that help users reach purchase decisions. We explore the task by proposing an Engagement-based Query Ranking (EQR) approach, focusing on query ranking to indicate potential engagement levels with query-related shopping knowledge content during product search. Unlike previous studies on predicting trends, EQR prioritizes query-level features related to customer behavior, finance, and catalog information rather than popularity signals. We introduce an accurate and scalable method for EQR and present experimental results demonstrating its effectiveness. Offline experiments show strong ranking performance. Human evaluation shows a precision of 96% for HC queries identified by our model. The model was commercially deployed, and shown to outperform human-selected queries in terms of downstream customer impact, as measured through engagement.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
- Information Technology > e-Commerce (1.00)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Communications > Social Media (1.00)
- (4 more...)
Query Performance Prediction using Relevance Judgments Generated by Large Language Models
Meng, Chuan, Arabzadeh, Negar, Askari, Arian, Aliannejadi, Mohammad, de Rijke, Maarten
Query performance prediction (QPP) aims to estimate the retrieval quality of a search system for a query without human relevance judgments. Previous QPP methods typically return a single scalar value and do not require the predicted values to approximate a specific information retrieval (IR) evaluation measure, leading to certain drawbacks: (i) a single scalar is insufficient to accurately represent different IR evaluation measures, especially when metrics do not highly correlate, and (ii) a single scalar limits the interpretability of QPP methods because solely using a scalar is insufficient to explain QPP results. To address these issues, we propose a QPP framework using automatically generated relevance judgments (QPP-GenRE), which decomposes QPP into independent subtasks of predicting the relevance of each item in a ranked list to a given query. This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels. This also allows us to interpret predicted IR evaluation measures, and identify, track and rectify errors in generated relevance judgments to improve QPP quality. We predict an item's relevance by using open-source large language models (LLMs) to ensure scientific reproducibility. We face two main challenges: (i) excessive computational costs of judging an entire corpus for predicting a metric considering recall, and (ii) limited performance in prompting open-source LLMs in a zero-/few-shot manner. To solve the challenges, we devise an approximation strategy to predict an IR measure considering recall and propose to fine-tune open-source LLMs using human-labeled relevance judgments. Experiments on the TREC 2019-2022 deep learning tracks show that QPP-GenRE achieves state-of-the-art QPP quality for both lexical and neural rankers.
- Asia > China (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- Europe > Netherlands > South Holland > Leiden (0.04)
- (3 more...)
The Surprising Effectiveness of Rankers Trained on Expanded Queries
Anand, Abhijit, V, Venktesh, Setty, Vinay, Anand, Avishek
An important problem in text-ranking systems is handling the hard queries that form the tail end of the query distribution. The difficulty may arise due to the presence of uncommon, underspecified, or incomplete queries. In this work, we improve the ranking performance of hard or difficult queries without compromising the performance of other queries. Firstly, we do LLM based query enrichment for training queries using relevant documents. Next, a specialized ranker is fine-tuned only on the enriched hard queries instead of the original queries. We combine the relevance scores from the specialized ranker and the base ranker, along with a query performance score estimated for each query. Our approach departs from existing methods that usually employ a single ranker for all queries, which is biased towards easy queries, which form the majority of the query distribution. In our extensive experiments on the DL-Hard dataset, we find that a principled query performance based scoring method using base and specialized ranker offers a significant improvement of up to 25% on the passage ranking task and up to 48.4% on the document ranking task when compared to the baseline performance of using original queries, even outperforming SOTA model.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Netherlands > South Holland > Delft (0.05)
- (4 more...)
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Poesina, Eduard, Costache, Adriana Valentina, Chifu, Adrian-Gabriel, Mothe, Josiane, Ionescu, Radu Tudor
Text-to-image generation has recently emerged as a viable alternative to text-to-image retrieval, due to the visually impressive results of generative diffusion models. Although query performance prediction is an active research topic in information retrieval, to the best of our knowledge, there is no prior study that analyzes the difficulty of queries (prompts) in text-to-image generation, based on human judgments. To this end, we introduce the first dataset of prompts which are manually annotated in terms of image generation performance. In order to determine the difficulty of the same prompts in image retrieval, we also collect manual annotations that represent retrieval performance. We thus propose the first benchmark for joint text-to-image prompt and query performance prediction, comprising 10K queries. Our benchmark enables: (i) the comparative assessment of the difficulty of prompts/queries in image generation and image retrieval, and (ii) the evaluation of prompt/query performance predictors addressing both generation and retrieval. We present results with several pre-generation/retrieval and post-generation/retrieval performance predictors, thus providing competitive baselines for future research. Our benchmark and code is publicly available under the CC BY 4.0 license at https://github.com/Eduard6421/PQPP.
- Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.69)
- Leisure & Entertainment > Sports > Tennis (1.00)
- Information Technology (0.68)
Query Performance Prediction: From Ad-hoc to Conversational Search
Meng, Chuan, Arabzadeh, Negar, Aliannejadi, Mohammad, de Rijke, Maarten
Query performance prediction (QPP) is a core task in information retrieval. The QPP task is to predict the retrieval quality of a search system for a query without relevance judgments. Research has shown the effectiveness and usefulness of QPP for ad-hoc search. Recent years have witnessed considerable progress in conversational search (CS). Effective QPP could help a CS system to decide an appropriate action to be taken at the next turn. Despite its potential, QPP for CS has been little studied. We address this research gap by reproducing and studying the effectiveness of existing QPP methods in the context of CS. While the task of passage retrieval remains the same in the two settings, a user query in CS depends on the conversational history, introducing novel QPP challenges. In particular, we seek to explore to what extent findings from QPP methods for ad-hoc search generalize to three CS settings: (i) estimating the retrieval quality of different query rewriting-based retrieval methods, (ii) estimating the retrieval quality of a conversational dense retrieval method, and (iii) estimating the retrieval quality for top ranks vs. deeper-ranked lists. Our findings can be summarized as follows: (i) supervised QPP methods distinctly outperform unsupervised counterparts only when a large-scale training set is available; (ii) point-wise supervised QPP methods outperform their list-wise counterparts in most cases; and (iii) retrieval score-based unsupervised QPP methods show high effectiveness in assessing the conversational dense retrieval method, ConvDR.
- Asia > Taiwan > Taiwan Province > Taipei (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
Unsupervised Search Algorithm Configuration using Query Performance Prediction
Search engine configuration can be quite difficult for inexpert developers. Instead, an auto-configuration approach can be used to speed up development time. Yet, such an automatic process usually requires relevance labels to train a supervised model. In this work, we suggest a simple solution based on query performance prediction that requires no relevance labels but only a sample of queries in a given domain. Using two example usecases we demonstrate the merits of our solution.
- North America > United States > New York > New York County > New York City (0.06)
- Asia > Middle East > Israel (0.04)
Reimagining Search
Ever since Gerard Salton of Cornell University developed the first computerized search engine (Salton's Magical Automatic Retriever of Text, or SMART) in the 1960s, search developers have spent decades essentially refining Salton's idea: take a query string, match it against a collection of documents, then calculate a set of relevant results and display them in a list. All of today's major Internet search engines--including Google, Amazon, and Bing--continue to follow Salton's basic blueprint. Yet as the Web has evolved from a loose-knit collection of academic papers to an ever-expanding digital universe of apps, catalogs, videos, and cat GIFs, users' expectations of search results have shifted. Today, many of us have less interest in sifting through a collection of documents than in getting something done: booking a flight, finding a job, buying a house, making an investment, or any number of other highly focused tasks. Meanwhile, the Web continues to expand at a dizzying pace.
Reimagining Search
Ever since gerard salton of Cornell University developed the first computerized search engine (Salton's Magical Automatic Retriever of Text, or SMART) in the 1960s, search developers have spent decades essentially refining Salton's idea: take a query string, match it against a collection of documents, then calculate a set of relevant results and display them in a list. All of today's major Internet search engines--including Google, Amazon, and Bing--continue to follow Salton's basic blueprint. Yet as the Web has evolved from a loose-knit collection of academic papers to an ever-expanding digital universe of apps, catalogs, videos, and cat GIFs, users' expectations of search results have shifted. Today, many of us have less interest in sifting through a collection of documents than in getting something done: booking a flight, finding a job, buying a house, making an investment, or any number of other highly focused tasks. Meanwhile, the Web continues to expand at a dizzying pace.